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ABSTRACT 

Most work on foreground removal has treated the case where the frequency dependence of all compo- 
nents is perfectly known and independent of position. In contrast, real- world foregrounds are generally 
not perfectly correlated between frequencies, with the spectral index varying slightly with position and 
(in the case of some radio sources) with time. A method incorporating this complication in presented, 
and illustrated with an application to the upcoming satellite missions MAP and Planck. We find that 
even spectral index variations as small as Aa ^ 0.1 can have a substantial impact on how channels 
should be combined and on attainable accuracy. 



L INTRODUCTION 

Future Cosmic Microwave Background (CMB) experi- 
ments can measure many key cosmological parameters to 
great precision (Jungman et al. 1996; Bond et al. 1997; 
Zaldarriaga et al. 1997) — in principle. To achieve this in 
practice, foreground contamination must be removed with 
comparable accuracy. Tegmark & Efstathiou (1996, here- 
after "TE96"), derived the foreground subtraction method 
that minimized the residual variance from foregrounds and 
noise under the assumption that the frequency dependence 
of all components was perfectly known and independent of 
position. This method, which was independently derived 
by Bouchet & Gispert (unpublished) , has now been exten- 
sively tested with simulations (see e.g. Bouchet et al. 1995; 
Bersanelli et al. 1996) where each frequency channel was 
the appropriate linear combination of a simulated CMB 
map, foreground templates such as the Haslam, DIRBE 
and IRAS maps, radio sources, and random noise. The in- 
version was found to accurately recover the input maps 
even though the foreground templates exhibited strong 
non-Gaussianity. 

To further improve such modeling, one must incorpo- 
rate the complication that real-world foregrounds are gen- 
erally not perfectly correlated between frequencies, with 
the spectral index varying slightly with position and (in 
the case of some radio sources) with time. Such spatial 
variations of the spectral index have been observed for 
both dust {e.g., Reach et al. 1996; Schlegel et al. 1997) 
and synchrotron radiation (Banday & Wolfendale 1991; 
Platania et al. 1997), and are of course even more pro- 
nounced for point sources {e.g., Francheschini et al. 1989, 
1991; Toffolatti et al. 1997). As we will see, neglect of this 
complication can cause severe underestimates of the resid- 
ual foreground level in the cleaned CMB map. It can also 
produce foreground residuals substantially higher than can 
be obtained with the method we derive below. 

2. METHOD 

As in TE96, we assume that we have sky maps at m fre- 
quencies vi,...,i'm (these maps may be internal channels 
* Available 



of a CMB experiment, but can include external templates 
such as the DIRBE maps as well) , and that these maps re- 
ceive contributions from n different physical components 
(CMB, dust, etc.). 

2.1. Pixel by pixel or wavelet by wavelet? 

The general problem treated in this paper is how to take 
linear combinations of these m maps to produce accurate 
maps of individual components. The traditional approach 
{e.g., Brandt et al. 1994) has been to perform this multi- 
frequency subtraction separately for each pixel (direction 
in the sky). However, this does not take advantage of the 
differences in smoothness between CMB and the various 
foregrounds. The correlation between neighboring pixels 
is typically stronger for diffuse galactic foregrounds (dust, 
synchrotron and free- free emission) than for CMB, where 
it is in turn stronger than for point sources. One can there- 
fore do better by performing the linear combinations mode 
by mode rather than pixel by pixel, using some variant of 
a Fourier expansion of the maps (TE96). The optimal 
weights for the linear combination then differ on large an- 
gular scales (where diffuse foregrounds are important) and 
small angular scales (dominated by point sources), as il- 
lustrated in Figure 1. In addition to this scale dependence, 
there is also a direction dependence, since some pixels typ- 
ically have higher average levels of noise (from receiving 
less observing time) or foreground contamination (from 
being closer to the galactic plane, say), than others. The 
suggestion of this author is therefore that foreground sub- 
traction be performed using modes that are fairly localized 
both in real space and in Fourier space, for instance some 
form of wavelets. 

2.2. Notation 

All the methods discussed below can be applied regard- 
less of which of the above-mentioned approaches is taken. 
Let Hj denote the is temperature ST measured at the j*^ 
frequency in a given direction (or in a given mode — 
in that case, yj is simply the corresponding multipole, 

Fourier, or wavelet coefficient). Let j/l*-* denote the contri- 
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bution to yj from the i^^ physical component. Grouping 
the mcasiu-omcnts yj into an m-dimensional vector y, we 
can thus write 



j=0 



(1) 



As in TE96, we assume that the different components have 

zero mean ((yW) = 0)1 and arc uncorrclatcd. which means 
that the data covariance matrix is simply given by 



C = (yy*) = 5^C 



(i) 



(2) 



i=0 



where C^') = {y'^^y'^^ ) is the covariance matrix of the 



■th 



(i) 



component. It is convenient to factor this as 



Rj^crj^V^^^ where the standard deviation and correlation 

is defined by af = [cg]V2 and Rg = C^/afa^, re- 
spectively. For definiteness, let us take component to be 
the CMB. Since the CMB temperature is the same in all 
channels, equal to x, say, we thus have y^"^ = ex, where 
the constant vector e is defined by = 1. We therefore 

have = (a;^)^/^, independent of j, and the correla- 
tion matrix R(°) = E, where E = ee*, a matrix consist- 
ing entirely of ones. Let us take component 1 to be the 
instrumental noise. Then a^p is simply the r.m.s. noise 

level in the j*^ channel, and if the noise is uncorrclatcd 
between channels, we have R'-^-' — I, the identity matrix. 
The remaining components (the various foregrounds) will 
typically have correlation matrices R^'^ that are intermedi- 
ate between these two extreme cases of perfect correlation 
(R = E) and no correlation (R = I) . 

Tegmark (1997, hereafter "T97") compared ten different 
methods for making CMB maps from time-ordered data. 
The CMB foreground removal problem is quite analogous 
to the mapmaking problem in that one seeks a linear inver- 
sion given certain assumptions about the "noise" . Indeed, 
all of the inversion methods described in T97 can be used 
for foreground removal as well, and we will repeatedly re- 
turn to these connections below. 

2.3. A signal-to-noise eigenvalue problem 

Let us consider an arbitrary linear combination of the 
channels, 

i = w-y, (3) 

specified by some m-dimensonal weight vector w. If we 
want X to estimate the component, then y*-'' is our 
signal and all the other components act as noise. Let N 
denote the covariance matrix of this generalized "noise" , 
i.e., 



(4) 



with w*Nw held fixed), we find that w is a solution to 
the generalized eigenvalue problem 



C^'W = ANw. 



(5) 



This is analogous to the signal-to-noise eigenmode method 
(Bond 1995; Bunn & Sugiyama 1995; Tegmark et al. 1997) 
used in CMB power spectrum analysis, except that the 
data set y is now the measurement at different frequen- 
cies rather than at different positions in the sky. The m 
different eigenvectors w give m uncorrelated estimators x, 
the least noisy one being that corresponding to the largest 
eigenvalue A. 

2.4. TE96 as a special case 

Throughout the rest of this paper, we limit our attention 
to estimating component 0, the CMB. Since 0^°^ oc E, a 
matrix of rank 1 (with only one non-zero eigenvalue, which 
corresponds to the eigenvector e), the eigenvalue problem 
reduces to a simple matrix inversion for this case: equa- 
tion (5) gives Nw oc Ew = e(e*w) oc e, so w oc N~ie. 
Normalizing w so that w*Ew = 1, we obtain 



N- 



w 



e*N- 



(6) 



The contribution to the variance {x?) of our estimator x 
from signal and noise is w*C*^*-'w and w*Nw, respectively. 
Maximing the signal-to- noise ratio (max;iming w*C^'-'w 

^Another advantage of working with modes rather than pixels is that 
are strictly non- negative) . In a Fourier or multipole expansion, all modes 



This normalization corresponds to Yl^i = e ■ = 1, so 
we can interpret x as simply a weighted average of the m 
channels, with w giving the weights (some weights may be 
negative) . 

In TE96, we assumed that the frequency dependence 
of each component was independent of position and time. 
Since this means that the map of a component looks the 
same at all frequencies, apart from an overall frequency- 
dependent scale factor, this assumption is equivalent to 
saying that each component is perfectly correlated between 
frequencies, i.e., that R^'^ = E except for i = 1, the instru- 
mental noise component. Following the notation of TE96, 
let S denote the diagonal covariance matrix of the different 
components at some fiducial frequency z/* (say 100 GHz), 
and let Fji specify the r.m.s. of the i*** component at the 
jth frequency relative to the value at z/,. The correspon- 
dence between TE96 and our equations is then given by 

^ = PjiSlt- Defining S s C^) and N/, ^ YJU C^^^ 
as the contributions to the covariance matrix N from re- 
ceiver noise and foregrounds, respectively, we can thus 
write 

N = N/g-KS = FSF*-FS, (7) 

TE96 reconstructed all components, not merely the CMB, 
with an estimator of the form x = Wy. Since R*-*^ = E for 
all the foregrounds, equation (5) will give a single eigen- 
vector with A > for estimating each one, just as for the 
CMB component. Arranging these vectors w as the rows 
of the matrix W and performing the relevant algebra, we 
obtain 

W = AF*[FSF*-FS]-\ (8) 

with Kjk = (5jfc/(WF)^j, i.e., equation (36) of TE96. We 
have thus generalized the TE96 result, and found that it 
corresponds to the special case of perfect foreground cor- 
relations, R^') = E. Conversely, it is easy to show that 

it can eliminate the rmisance of a non-zero mean (most foregrounds 
but the (irrelevant) monopole will have a vanishing average. 
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our method (6) can be derived from the TE96 method 
by replacing each foreground component with many sub- 
components, each with a shghtly different frequency de- 
pendence, as described in §5.4 of TE96. 

3. THE TRADEOFF BETWEEN FOREGROUNDS AND NOISE 

Our discussion above has illustrated that foregrounds 
are very much like detector noise — they are simply more 
correlated between channels. When chosing w to make 
a CMB map, there is generally a tradeoff between the 
amount of residual noise cr^ = [w'Sw]^/^ and residual 
foreground contamination a-jg = [w*N/gw]^/^. This is 
clearly seen if we minimize (Tj^ for some fixed level of noise 
CTjg, maintaining our normalization constraint e • w = 1. 
Solving this constrained minimization problem by intro- 
ducing Lagrange multipliers 7 and A, this corresponds to 
minimizing the expression w* [N fg + 7S]w — Ae • w, which 
gives w oc [N/g -I- 7S]~^e. We recognize this as our solu- 
tion in equation (6), but with N = N/g + S replaced by 
N/g -I-7S, i.e., with the noise level rescaled relative to its 
true value by a factor 7. For the TE96 case, this gives 

W = AF*[FSF*+7S]-S (9) 

which corresponds to "Method 8" in the method table of 
T97. Below we will see that this free parameter 7 can be 
chosen to indicate how concerned we are about cr„ relative 

to (Jfg. 
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Table 1. — Specifications used for COBE, MAP and Planck. 

Figure 2 shows the result of applying equation (9) to 
the three satellite experiments COBE, MAP and Planck, 
with the lines corresponding to 7 ranging from to oc. 
Here we have used four foreground components in = 5): 
dust, free-free emission, synchrotron radiation and point 
sources. These are modeled as in TE96, with some mi- 
nor updates as shown on Figure 1 to reflect recent fore- 
ground measurements (Bersanelli et al. 1996; Kogut et 
al. 1996ab; de Oliveira-Costa et al. 1997; Toffolatti et 
al. 1997). We have used the experimental specifications 
shown in Table 1, taken from Bennett et al. (1996) and the 
MAP and Planck web sites {http://map.gsfc.nasa.gov and 
http://tonno.tesre.bo.cnr.it/research/planck/tab-sens.htm) 
LFI and HFI refers to the low and high frequency instru- 
ments on board Planck. 



Our derivation above showed that no method can give 
a point (cr„,c7jg) below or to the left of this line, i.e., that 
equation (9) minimizes the foreground residual for any 
given noise level cr/g. The original TE96 method (7 = 1, 
indicated by a solid square) corresponds to minimizing the 
total residual variance, i.e., cr^ -I- cr^g, so in a linear- linear 
plot, the TE96 point lies where the line is closest to the 
origin. As we increase 7, the algorithm cares more about 
reducing noise and less about foregrounds. The upper end- 
point of the line, with 7 = 00, corresponds to w oc ^~^e, 
which if the detector noise is uncorrelated (R^^^ = I) is 
a simple minimum-variance weighting, ignoring the fore- 
grounds. For COBE, this extreme case is seen to minimize 
the total residuals as well — indeed, most published anal- 
yses of the COBE data made this choice, abstaining from 
foreground subtraction. 




Noise rms [/.tK] 

Figure 2. The curves show the smallest residual foreground level 
attainable for a given noise level, assuming that the frequency depen- 
dence of the foregrounds is perfectly known. The total r.m.s. residual 
(cr^ "'""^/g)^^^ minimized at the solid squares (the TE96 method). 
Arrows point to (<^n,cTfg) for the marginalization method, which is 
seen to give uyg = (off the scale) , but occasionally at a higher noise 
cost than necessary. These curves are for a mode with ^ = 10 — 
due to their differences in angular resolution, the experiments differ 
more dramatically for larger £ as well as on a pixel-by-pixel basis. 

There may be good reasons to be more concerned about 
foregrounds than detector noise. For instance, they tend 
to be non-Gaussian and we are usually unable to model 
their frequency and scale dependence as accurately as for 
detector noise. If we in this vein decrease 7, we move 
downward along the curve. For cases when the number of 
channels equals or exceeds the number of components (as 
for HFI and MAP), complete foreground removal is pos- 
sible: afg ^ as 7 ^ 0, corresponding to a weighting 
where w is the first row of [F*S-1F]-1F*S-i ("Method 
3" of T97). For these cases, the factor by which (t„ in- 
creases as we go from no foreground removal (upper end- 
point) to complete foreground removal (lower endpoint) is 
the Foreground Degradation Faction (FDF) introduced by 
Dodelson (1996). The residual noise is also shown (by ar- 



4 



rows at bottom) for the marginalization method derived by 
Dodelson (1996), in which w is the first row of [F*F]^^F* 
("Method 2" in T97). This is seen to give an FDF that is 
about a factor of 2 larger for the HFI (m=6) and HFI-LFI 
(m=10) cases, but identical to the TE96 method for MAP 
(m=5). The reason is that when there arc more channels 
than components (m > n), there are m — n degrees of free- 
dom left in w after we have required that the foregrounds 
be eliminated and imposed the normalization constraint. 
The TE96 formula uses these extra degrees of freedom to 
minimize i7„. 

Although the above-mentioned reasons for trying to re- 
duce (T/g below (T„ (which might require 7 < 1) may be 
valid, they are not grounds for outright foreground para- 
noia. Attempts to push Ufg down say a factor of ten below 
C7„ are probably overkill and not worth the heavy cost in 
terms of increased noise. Most importantly, as we will see 
in the next section, such attempts are likely to be mis- 
leading, since even tiny departures from perfect correla- 
tions can reintroduce non-negligible foreground residuals. 
Although the 7 = method has been advocated as conser- 
vative (Dodelson & Stebbins 1994; Dodelson 1996), since 
it requires no assumptions about the amplitude of the fore- 
ground fluctuations, we will see that it is quite sensitive 
to assumptions about their frequency dependence. 



4. THE EFFECT OF FREQUENCY COHERENCE 

assumed perfect foreground correlations, 
i> 1. We will now relax this assumption. 



Figure 2 
R(») = E for 



4.1. A toy model 



To illustrate the qualitative changes that occur, let us 
derive a simple toy model in which we can relate the cor- 
relation matrices R^*^ to more familiar quantities. Given 
some foreground component i and two frequencies Vj and 



Vk, we define (j). 
Tj = Vk/i^j and a 



2/. 



(i) 



- ii) 



\l/2 



ln((^+/(^_)/lnry. Thus cf)- and de- 
note the brightness of a pixel at the two frequencies, cf) is 
the (geometric) mean brightness, and a, the "color", is the 

spectral index for which a power law spectrum 4>{v) oc 
would connect <b- with <b^. With this notation, we have 



±a/2 



(10) 



Let us make the simplifying assumption that the bright- 
ness (j) and the color a are statistically independent. Al- 
though probably not very accurate, this approximation is 
motivated by the fact that (f) depends strongly on color- 
independent quantities such as the distance (in the case 
of radio sources) and on the amount of emitting material 
along the line of sight (in the case of the diffuse foreground 
components). Using this independence gives 



(11) 

(12) 
(13) 



We define the means and standard deviations a = (a), 

4> ^ (0), Aa = ((a2) _ ^2)1/2^ ^ (^^2^ _ ^2)1/2^ 

Let us also assume that the quantity Aalnry <C 1, so 
that a fairly definite spectral index will be apparent in 



a scatter plot of In 0+ against In 0- . Taylor expanding 
the exponential, this allows us to make the approxima- 
tions (7?=^") = ,y±"(e±("-")l"'') « j^±«p(Aalnr,)V2 ^nd 



j^±a/2g(Aa 11177)78^ 



Substituting this into equa- 
-(13), we can compute the standard deviations 
_ ^'±)^)^^^ and the correlation. We find 
that « A(/)+/A(/)_ « 77", so the mean brightness 

and the r.m.s. fluctuations scale in the same way with fre- 
quency, as expected. The correlation coefficient is given 

by 



tions (11)- 
A.^± = {{cl>l) 



A(/>+A(/>_ 



where 



Aa(l + /32)i/2' 



(14) 



(15) 



and /3 = (l)/A(p is the ratio of the mean brightness to the 
r.m.s. fluctuations. We will call the parameter ^ the fre- 
quency coherence, since it determines how many powers of 
e we can change the frequency by before the correlation 
starts breaking down. The two limits ^ ^ and ^ ^ 00 
correspond to the two extreme cases R^*^ = I and R^*' = E 
that we encountered above. Since the temperature in a 
foreground map typically range from its maximum down 
to values near zero, with the r.m.s. fluctuations Acp being 
of the same order of magnitude as the mean 4>, P is usu- 
ally of order unity and we arrive at the following useful 
rule of thumb: The frequency coherence is of the order of 
the inverse spectral index dispersion. 
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Figure 3. Same as figure 2, but for Planck HFI-I-LFI only, vary- 
ing the frequency coherence ^. The five dashed (unlabeled) curves 
show the result of assuming ^ = 00 when in fact ^ = 0, 2, 10, 100 
and 10* (from top to bottom). 

Figure 3 shows how the Planck results from figure 2 
(^ = oc) change when ^ is reduced. The solid squares 
correspond to equation (6), and the tradeoff curves are 
generated by rescaling the receiver noise contribution to 
N in equation (6) by different constants. We see that 
these curves follow the ^ = 00 curve down from the top, 
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then branch off to the right at a foreground level that de- 
pends on ^. To reduce the foreground residual below this 
level becomes extremely costly in terms of extra noise (giv- 
ing a large FDF, in Dodelson's terminology). Complete 
foreground removal is of course impossible when ^ < oo. 
The dashed curves show that if the subtraction method 
assumes ideal (^ = 00) foregrounds, it is disastrous to be 
too greedy and try to push the cr/g way below cr„, since 
this can actually make things worse! 

Note that since we assumed that Aa Inrj <C 1 , our 
derivation of equation (14) only applies to the first 3 terms 

in a Taylor expansion, showing that = fO-^v/O where 
f{x) = 1 — + .... We recomputed Figure 3 for a va- 
riety of such function of the form f{x) = (1 + /2n)~^ 
(n = 00 gives the Gaussian of equation (14), n = 1 gives 
a Lorentzian, etc.), and found that the shape of the far 
wings of / is only of secondary importance — the main 
question is how correlated neighboring channels are, which 
for ^ 1 depends mainly on the curvature of / near the 
origin. Narrower wings (larger n) can occasionally help 
slightly, just as ^ = is better than ^ = 1 in Figure 3. 

5. CONCLUSIONS 

When removing CMB foregrounds, one can take advan- 
tage of all ways in which they differ from CMB fluctua- 
tions. 

1. Non-Gaussian behavior can be exploited to throw 
out severely contaminated regions {e.g., bright point 
sources, the Galactic plane). 

2. Their frequency dependence can be exploited to sub- 
tract them out as we have described above. 

3. Knowledge of their power spectra can be used by in- 
cluding residual foreground fluctuation amplitudes 
as additional free parameters when fitting the mea- 
sured power spectrum to theoretical models. 

The TE96 subtraction method (for step 2) has been shown 
(T97) to be lossless (retain all the cosmological informa- 



tion) if the foregrounds are Gaussian with ^ = cxd, and if 
the subtraction is performed mode by mode (as suggested 
by TE96 and implemented by Bouchet et al. 1995) rather 
than pixel by pixel — the latter destroys information by 
not taking advantage of correlations between neighboring 
pixels. In this Letter, we have studied the more realistic 
case ^ < 00, and found that even spectral index variations 
as small as Aq = 0.1 make a substantial difference for 
the choice of method and for attainable results. Complete 
foreground removal becomes impossible, and attempting 
this nonetheless by assuming ^ = 00 can even be worse 
than no foreground removal at all. It is easy to show that 
the method of equation (6) is lossless with the same as- 
sumptions, for any f . This means that one more property 
needs to be determined for each foreground component, 
in addition to its dependence on frequency and scale: its 
frequency correlations R^'^ . With a simple toy model, we 
illustrated that this is directly linked to the spectral in- 
dex dispersion Aa. Aa could easily be as large as 0.1 
for synchrotron radiation. 0.3 for dust. 0.01 for free-free 
emission and 0.5 for radio sources if we neglect sources of 
prior information about a. It has been argued that the 
spectral index for dust depends on galactic latitude (e.g.. 
Reach et al. 1995), whereas that for synchrotron emission 
is correlated with both the spectral index that can be mea- 
sured at lower frequencies (Brandt et al. 1995) and with 
the degree of synchrotron polarization (Bernstein 1992). 
By improving our understanding and modeling of how the 
foreground spectral indices vary with position, it may thus 
be possible to reduce the effective Aa, thereby improving 
our foreground removal and the accuracy with which cos- 
mological parameters can be measured with the CMB. 
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Figure 1. Where various foregrounds dominate. The shaded regions indicate where the various foregrounds cause fluctuations exceeding 
those of COBE-normalized scale-invariant fluctuations, thus posing a substantial challenge to estimation of genuine CMB fluctuations. They 
correspond to dust (top), free-free emission (lower left), synchrotron radiation (lower left, vertically shaded) and point sources (lower and upper 
right). The heavy dashed line shows the frequency where the total foreground contribution to each multipole is minimal. The boxes roughly 
indicate the range of multipoles £ and frequencies u probed by various CMB experiments, as in TE96. 



